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Abstract. The high level of dynamics in today's online social networks (OSNs) 
creates new challenges for their infrastructures and providers. In particular, dy- 
namics involving edge creation has direct implications on strategies for resource 
allocation, data partitioning and replication. Understanding network dynamics in 
the context of physical time is a critical first step towards a predictive approach 
towards infrastructure management in OSNs. Despite increasing efforts to study 

**J . social network dynamics, current analyses mainly focus on change over time of 

C/5 ' static metrics computed on snapshots of social graphs. The limited prior work 

O | models network dynamics with respect to a logical clock. In this paper, we present 

results of analyzing a large timestamped dataset describing the initial growth and 

^vj ' evolution of Renren, the leading social network in China. We analyze and model 

^. , the burstiness of link creation process, using the second derivative, i.e. the accel- 

eration of the degree. This allows us to detect bursts, and to characterize the social 
activity of a OSN user as one of four phases: acceleration at the beginning of an 

r^" , activity burst, where link creation rate is increasing; deceleration when burst is 

>D ■ ending and link creation process is slowing; cruising, when node activity is in a 

£/*) ' steady state, and complete inactivity. 
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1 Introduction 



J3 t The rapid growth of online social networks (OSNs) has created numerous technical 

challenges for the providers that supply the hardware and software infrastructure be- 
hind these web services. As one example, the creation of social links between users 
dramatically change demands on social network infrastructures in terms of access, stor- 
age and computation. Depending on the specific configuration of backend servers, for 
example, changes in the social graph can affect how data is partitioned across clusters, 
or how much replication is necessary to sustain low query response times. 

However, very little is known about how social network dynamics correspond to 
actual clock time. The large majority of prior work on OSN analysis has focused on 
analyzing, mining, and modeling static topologies or static snapshots of dynamic pro- 
cesses. Only recently have researchers begun to study dynamic processes in social net- 
works, most often by analyzing how classical graph metrics such as degree, connected 



components, and shortest paths change over time. This has led to models of underly- 
ing processes such as densification and shrinking diameters [T]. These models describe 
how graphs change and how edges are created with respect to a logical clock, i.e. a 
homogeneous sequence of events. 

But how do these events match up to events in real time? Can we better understand 
how edge creation events relate to each other, and can the occurrence of such events 
be predicted with respect to a physical clock? This work is an initial effort to answer 
some of these questions, but analyzing one specific temporal property of burstiness in 
edge creation. Our work is motivated in part by models of human dynamics adopted in 
a wide range of disciplines, from economics to communications. Recent studies (12131 
have shown that human dynamics are best described by periods of rapidly occurring 
events interleaved with long periods of inactivity. Thus we ask the question: Is link 
creation in online social networks a bursty process? 

In this paper, we provide an initial answer to this question, by analyzing an anonymized 
temporal trace of edge creation events over a period of a year in a large OSN. The trace 
describes events in Renren |4|, the largest social network in China with more than 220 
million users. Our analysis shows that edge creation is a highly inhomogeneous and 
bursty process. We then ask two followup questions: a) Given an high level bursty struc- 
ture, does an inner substructure exist, and how can it be characterized; and b) How can 
we detect both the whole burst and its internal phases? 

Understanding the internal structure of edge creation bursts can shed light on the 
underlying user process, e.g. is the user gradually enlarging her circle of friends or has 
she discovered a new cluster of her offline friends. Known techniques for the analysis 
and the detection of burst events (gamma-ray, text mining, stock market) focus on locat- 
ing a burst when it occurs, but they do not consider events inside the detected temporal 
window. Thus we propose a new methodology able to detect bursts, their internal struc- 
ture and the transitions between the different phases a node experiences. We perform 
a second order analysis on the link creation process by computing, for each node, the 
acceleration of the degree time function to characterize the burst structure. 

Finally, we apply our acceleration metric and the detection of bursty phases on 
Renren. We find that all nodes exhibit similar patterns over time, characterized by an 
intense burst of activity following their joining the network. The initial burst is followed 
by weaker bursts over time, each composed of an acceleration phase, followed by a 
longer period of slowly vanishing deceleration. 

The discovery of highly bursty patterns paves the way for new generative models 
that not only capture graph dynamics in terms of phases of node activity, but also de- 
scribes such events with respect to physical time. In addition, burst analysis can reveal 
further insights into the formation and liveness of individual users, communities, and 
provide a basic metric of useful in characterizing and comparing different traces of 
network dynamics. 

2 Related Work 

Time evolving OSN Snapshots. While static features of OSNs are well studied, 
works on dynamics of online social networks are still ongoing. Among all Leskovec et 



al. in JT| detected two important properties on dynamic OSN data: graph densification, 
i.e. the average degree increases, and shrinking diameter. Several different social graphs 
has been studied in order to capture the growth of components and communities. Palla 
et al. O investigated the time dependence of overlapping communities and Berlinghe- 
rio et al. J5) detected clusters of temporal snapshots of a network, interpreted as eras 
of evolution. Authors in I7I8I9I studied the dynamics of disconnected components. Fi- 
nally Backstrom et al. IfTUl investigated the structural features which influence people 
in joining communities and their growth process. Alternatively, the per node dynamics 
was studied in |11| where the authors captured the evolution of key network parame- 
ters, and evaluate the extent to which the edge destination selection process subscribes 
to preferential attachment. As concerns acceleration, in ifPH the authors considered an 
overall network size growth as a global property and they modeled this global acceler- 
ation for the purpose of predicting the next network stage. 

Interdisciplinary Study of Human Dynamics. In (13), Barabasi observed that the 
timing of human activity is inhomogeneous and bursty, disputing the previous hypoth- 
esis that human activities are randomly distributed in time. The inhomogeneity idea 
was extended in J2) and validated on few networks such as an Hungarian news por- 
tal, e-mails, library activities in an University and a trade transactions. Similarly, lfl"4l 
analyzes the activity burstiness of blogs using entropy plots, and show non-uniformity 
and self-similarity of the number of posts time sequences. Furthermore, lfT5l and lfl6) 
observe that temporal patterns are inhomogeneous or bursty even in mobile phone calls 
and in citation dynamics. Finally, developed a burst detection algorithm and ob- 
served that the appearance of a topic in a stream of documents, such as e-mails or 
research papers, has a bursty behavior. As far as we know, burstiness has never been 
investigated in OSN. 

3 Timestamped Renren Dataset 

A major obstacle to studying OSN dynamics is the difficulty of obtaining detailed data 
traces. Our study uses an anonymized dataset that contains the timestamped creation of 
all users and edges in the Renren social network [4]. Renren is the largest and oldest 
OSN in China, with functionality similar to Facebook, and has currently over 220 mil- 
lion users. Our anonymized trace describes the evolution of each nodes as a sequence of 
timestamped edge creation events. The first edge created in the Renren network dates 
back to November 2005. We track the complete evolution of the oldest 60000 nodes in 
Renren, for a total of 8 million edges created over a period of one year from November 
2005 to December 2006. 



4 bursty nature of link creation 

Bursty behavior has been observed in various contexts such as WWW traffic patterns ifTTI . 
emails exchanges, and in general human behavior |2|. But they have not been studied 
in the context of online social networks. In this section, we study the link creation pro- 
cess as the growth of the neighborhood of each single node, and show that the linking 
activity of online social networks users is characterized by temporal bursty patterns. 




Fig. 1. Distribution of the scale parameter a, which characterizes the inter-event time distribution 
between consecutive link creations for a single individual, a values have been grouped in bins of 
length 0.05. Values past the peak around 1 decreases much more slowly with respect to the left 
side. 
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Fig. 2. a) mean, median, and standard deviation (error bar) of a as function of the final degree. 
Mean and median increase with the final node degree. To compute these values, we group node 
degrees in bins of 10, and consider their relative a values, b) mean, median and standard deviation 
(error bar) of a as function of node age measured in weeks. Mean and median decrease very 
slowly with node age. 



To prove the burstiness of link creation, we consider for each user the event time 
series where an event is represented by the creation of an edge incident to the consid- 
ered node. On each time series, we apply the technique proposed in Vasquez et. al. IE) 
and extended by McGlohon et. al. 1141 , both based on the inter-event time distribution 
between consecutive events for a single individual. If the edge creation process is a 
Poisson-like process, i.e. homogeneous, then the inter-event time distribution should be 
an exponential distribution. On the other hand, a bursty arrival process is characterized 
by a power-law distribution where many short time intervals, each corresponding to in- 
tensive activities forming a burst, are separated by relatively fewer but longer periods 
of low or zero activity. 

Results. In order to distinguish if the process is homogeneous or bursty, we fit the 
inter-event time data per node in our Renren dataset using MLE (Maximum Likelihood 
Estimator), and select the model with the minimum AIC (Akaike Information Crite- 
rion). As a representative of the power law distribution family, we choose the Pareto 
with exponential cutoff P{t) = t~ a exp(—t/\), and use the exponential distribution 
P(i) = iu,exp(—fi) to describe the inter-event time Poisson process. Finally, to avoid 
the impact of outliers, we remove from consideration users who have too few events, 
i.e. nodes with final degree less than 15 (median degree). 

Our results show that minimum AIC is achieved by the Pareto distribution with ex- 
ponential cutoff, meaning almost all users in our dataset manifest a bursty behavior in 
link creation. In addition, the Kolmogorov-Smirnov (K-S test) validates the selected hy- 
pothesis for almost all users (86% of the population). These measurements offer direct 
evidence that at the level of a single individual, there is a heavy-tailed activity pattern. 

Having shown that individuals add links in a temporal bursty manner, we analyze 
the similarity of the bursty process across users, by computing the distribution of the 
scale parameter a determined separately for each user. As shown in Figure [T] a values 
are scattered around a peak at 1, with an heavy tail in the right side. This partially 
corroborates the results found in [2], which showed a single group of users with very 
similar behavior described by the Gaussian distribution of a centered at 1 . However, 
the heavy tail suggests that users in online social networks cannot be easily grouped in 
a single category, but have quite different behaviors in adding links. 

To understand the reasons behind the observed heavy tail, we take into account two 
factors: the degree and the age of a node, i.e. how long the node has been in the network 
(in weeks). In Figure [2] we show the relationship between the scaling parameter a and 
the two variables we consider. Between a and the degree, we observe that the mean a 
value increases with degree as shown in Figure [2(b)] This fact suggests that nodes with 
higher degree contribute more to the right tail. This means that, although all the nodes 
manifest the same bursty behavior, nodes with higher degree have more closely spaced 
bursts. With regards to age, shown in Figure [2(a)1 shows that age does not influence the 
right tail, since the mean value is close to the mean of the a distribution, and remains 
quite constant for different age values. The small decrease is due to the fact that older 
nodes have a greater chance to undergo long periods of inactivity. 

In summary, we showed in this section that users follow a bursty process in creating 
links, where bursts occur more frequently in nodes with high final degree. 



5 Degree Acceleration 

Bursty phenomena have been studied in different areas of human activities, such as 
clicks or queries in search engines [ 18 1. However, these previous investigations focused 
on bursts resulting from aggregate actions, such as group of users that manifest a com- 
mon interest at a certain time. These burst detection algorithms are not suitable to in- 
vestigate per-node time sequences of link creation, or substructures inside bursts. 

In this section, we propose a new methodology that identifies different phases that 
make up the bursty nature of the link creation process, and detects when bursts oc- 
cur. We also identify the role played by each phase during the bursty process. From a 
high level, we observe that the alternation of activity/inactivity phases determines the 
burstiness of the event trace. In addition, bursts of activity have a typical internal struc- 
ture, composed by a rapidly increasing slope and a gradually decreasing phase possibly 
interleaved by a plateau. An example is shown in Figure [3] 

Degree Acceleration. Inspired by studies in physics and neuroscience on highly 
dynamic systems |fl9l , we investigate the phases in bursty processes and detect bursts by 
measuring significant increments and decrements of new links formed per node. A burst 
begins when link formation activity rapidly increases, and ends following a decreasing 
phase. By leveraging the concept of acceleration, it is possible to easily identify and 
quantify significant changes in link creation activity. Let di(t) be the degree of node 
i at time t, i.e. the total number of links incident to node i at time t, and let At be 
the time granularity that interleaves each di(t) measures. We can then compute degree 
acceleration as: 

d _ dj(t) - 2dj(t - At) + d l (t - 2 At) 

By computing degree acceleration, we can observe the initial start of bursts (af >> 
0) and a burst's decaying phase (af << 0). An example is shown in Figure|3] Note that 
acceleration captures two types of steady state conditions: a period of consistently high 
activity representing the plateau inside an activity burst (after an initial acceleration 
phase), and a steady state of low activity outside of activity bursts. 
Defining Phases. While exploring the burstiness of the link creation process, we 
found that the growth of each node is characterized by transition phases in which users 
significantly change their link formation behavior. This led us to identify four different 
phases that describe the patterns involved in the nodes' growth processes. The different 
phases can be described by defining a time-dependent state variable for each node in the 
system. More specifically, the acceleration phase is characterized by a large increment 
in creating new links, i.e. af >> 0, and the deceleration phase is described by a strong 
decay measured by af << 0. Then we define two intermediate phases: cruising and 
inactivity. The first corresponds to a steady state of a node, where the number of links 
created per week is almost constant. This phase can correspond both to high activity 
or to small oscillations around inactivity, and is characterized by at least one new edge 
(captured by the variable a(t) — 1) and small af values. These small af values are 
centered around the value af = 0, and are bounded with two thresholds 6\ and 62- 
The second phase, i.e. inactivity, occurs when a node does not create any links for 
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Fig. 3. An example of degree acceleration, computed on a single node from our dataset. The 
green dotted line represents the number of links created by the node each week. The red dotted 
line represents the acceleration computed according to Eq. [Tj In week 39, the node shows a large 
acceleration, follows by a plateau. The node decelerates into week 42, when it enters a cruising 
phase (link creation is stable) for 4 weeks. 



an entire time window. We formalize these four phases by introducing the function 

Si(i) : R — > {ace, dec, cruise, inaci) defined as follows: 

ace af(t) G (0i,+oo) 

i dec af(t) G (-oo,0 2 ) n , 

^ cruise af{t) G [0 2) 0i] A a(t) = 1 K ' 

inact Ci(t) = 

where Ci(t) = 1 if and only if node i creates at least one edge at time t, otherwise 
Ci(t) = 0. Degree acceleration af(t) and the related Si(t) function represent a general 
tool to investigate the burstiness structure, and to highlight the detailed properties of 
each phase. 



6 Experimental Analysis 

In this section, we characterize the link creation process by analyzing our Renren trace 
using our acceleration methodology. The experimental analysis has been performed 
with the following settings: At = 1 week to avoid cyclic fluctuations in acceleration 
due to increase in user activities over each weekend, and cruising phase thresholds are 

6»i = 2 and 2 = -2. 

6.1 The Role of Phases 

The role played by each phase along the node lifetime is a key element to understand 
the network dynamics, and is also crucial when designing generative models based on 
per-node temporal behavior. To this purpose, we consider two main aspects: (i) the time 
a node spends in each phase and (ii) the per-node amount of links created in the different 
phases. 



We perform this analysis from two perspectives, by considering the aggregate be- 
havior of all nodes, and on per-node behavior. In order to understand the role of different 
phases during a node's lifetime, we define (j> 1 and ip l (i) to compute the percentage of 
time spent in each phase by all nodes (Equation[3]i and by each node (Equation|4]i. 



phase 



H i( : N life(i) 



(3) 



ilphaseii) 



Z_^t—1 -Lphase \^i\p)) 

life(i) 



(4) 



where life(i) represents the lifetime in weeks of a node, I is the indicator function and 
phase = {ace, dec, cruise, inact}. N indicates the number of nodes at time T, which 
represents the last week considered in the dataset. 

The relationship between link creation and phase is quantified by cj) e , which cor- 
responds to the percentage of the overall edges created within each phase, and ip e (i), 
which is the link generation rate for node i in each phase: 
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where m is the number of link at time T. The results are reported in TableQ] where i(j 1 8 
and ipQ g are the 0.8-quantile of the distributions of ip and ip e , and are discussed below. 
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Table 1. In the first two rows we report cj> e (definition [5j and cj> 1 (definition [3} values for each 
phase. In the last two rows the 0.8-quantiles of ip l and ip e distributions. 



Inactivity phase. During the inactivity phase, by definition, we do not observe 
growth since no links are created. However, inactivity acquires importance in the tem- 
poral dimension, because it deeply affects the burstiness. The high values of <f> 1 and tp l 8 
highlight that node activities are concentrated in few and small periods; thus, for most 
part of their life, nodes do not influence the network dynamic evolution. 
Acceleration and Deceleration Phases. Nodes spend only a small amount of their 
life in these phases, in particular after acceleration events, longer period of weaker 
activity follow. However, the amount of links generated in these phases determines the 



structure of Renren. In fact, a link has very high probability, 69%, to be generated in 
one of these two phases, in particular 52% in acceleration and 17% in deceleration. 
Cruising Phase. Cruising periods cover an important portion of nodes' lifetime. 
Furthermore, <\f = 0.31 and ip e = 0.16 would suggest that this phase has a role also 
in link creation. However, only few cruising periods have relevance in the edge growth. 
Indeed, it depends on whether the cruising phase is inside a burst or it corresponds to 
small oscillations around inactivity. A node in a burst cruising phase is creating many 
links, while in the other case the number of links created is irrelevant. Finally, the cruis- 
ing phase has a pronounced impact only for nodes with low degree, as shown in Figure 

HI 

We have shown that acceleration (ace) and deceleration (dec) phases are those re- 
sponsible of the growth and dynamics of the network, despite the fact that they represent 
a very small part of a node life. 
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Fig. 4. Relationship between V'cruise an d the degree. i/)J; r „ ise decreases as the degree raises, so 
the cruising phase has a pronounced impact only for nodes with low degree. 



6.2 Acceleration and Deceleration Features 

In depth understanding of acceleration/deceleration phases reveals how users operate 
in the network after they join. This knowledge could be very useful to ensure efficient 
management of the OSN's resources. This section focuses on acceleration and decel- 
eration by means of illustrating their importance from a network perspective; showing 
that they follow a power law distribution and finally investigating the impact of node 
aging on link creation process. 

Network perspective on acceleration/deceleration. From the network perspective, 
an estimate of how many and which nodes are changing the graph structure would 
greatly help in managing the system resources. As it can be seen in Figure [6] in each 
week only a very small number of nodes are acc/dec phases; for example at the end 
of the year they are almost 20%. These nodes can be easily identified as soon as they 
experience a phase transition from the inactive/cruising to the accelerated phase since 
their values of acceleration abruptly increase. 
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Fig. 5. a) Acceleration CCDF and the resulting fitted distribution (a 
CCDF and the resulting fitted distribution (a = 3.34). 
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Fig. 6. For each week, the number of nodes in the network (network size) and the number of 
nodes in the acceleration/deceleration phases. In each week only a very small number of nodes 
are in the acc/dec phases though the network size rapidly grows. 



Acceleration/deceleration probability distributions. By applying the statistical 
framework proposed by Clauset ll20l . we find that acceleration and deceleration distri- 
butions are power law, (Figure |5(a)1 and |5(b)) . By considering the overall network, this 
result implies that half of the acceleration and deceleration events have a small size, 
but they are very likely to show rapid increase and decrease respectively. The upper tail 
of the acceleration distribution exhibits so high values of acceleration that can't corre- 
spond to normal user. Those events are most likely associated to people with a large 
amount of followers or accounts for advertisement. 

The impact of aging. The general behavior of a node is a sequence of accelera- 
tion/deceleration phases of constant magnitude, after an initial burst. In general, nodes 
wait at most for one month before initiating their activity. 

We start by defining the age(t) of a node u as the time elapsed between the appear- 
ance of u in the network (timestamp of the first edge incident to u), and time t. The ob- 
servables whose dependence on age, need to be studied are: rif irst A cc /£, ec (t), the num- 
ber of nodes showing their first acceleration / deceleration at time t and n max ^ cc / Dec {t), 
the number of nodes manifesting their maximum acceleration / deceleration at time t. 
Finally, we calculate the average acceleration/ deceleration ai><Mce/£>ec( c '<?e). 
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Fig. 7. a) shows the times when nodes first experience their first acceleration, maximum accel- 
eration, first deceleration, and maximum deceleration (y-axis on logscale). b) shows the average 
acceleration/deceleration with respect to node age. 



Analyzing and comparing n max Acc(t) and avgAcc(age) in Figures [7(a)| and [7(b)| 
we observe that most nodes enter the phase of maximum acceleration in the first week. 
In addition, Figure [7(a)| shows that the activity after the first peak does not decrease as 
fast as its respective acceleration. 

Figure |7(b"j| highlights another interesting behavior of the acc/dec phases. The av- 
erage acceleration remains constant when age increases. This is consistent with what 
we found in Figure [3] i.e. nodes experience a big burst of acceleration in the first week 
after joining the network, and subsequent bursts never match the first in intensity. 



7 Conclusion 



In this paper we investigated the bursty nature of the link creation process in OSN. We 
prove not only that it is an highly inhomogeneous process, but also identify patterns of 



burstiness common to all nodes. In terms of edge creation, users are inactive for most 
of their lifetimes, and concentrate their link activity in a number of short regular time 
periods. To characterize node activity, we develop a new methodology based on the 
acceleration of degree growth, which allows us to highlight the internal structure of link 
creation bursts. 

We believe using acceleration as a general metric to characterize network dynamics 
prompts future work in studying link generation mechanisms. In particular, defining 
different phases of edge creation hints at the possibility of characterizing users into 
distinctive activity levels that correlate with their likelihood of adding social links. Some 
preliminary results confirm this intuition: when nodes (users) first join the network, 
they create links based on the preferential attachment mechanism; while in later bursts, 
nodes seem to explore (acceleration phase) and densify (deceleration) in far regions of 
the graph. These results open the door for new generative models that consider different 
phases of node activity. 
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